[ET-VK] Add alignment fields to PackedDimInfo for padded size calculation#17260
Merged
[ET-VK] Add alignment fields to PackedDimInfo for padded size calculation#17260
Conversation
added 2 commits
February 5, 2026 10:21
… per-shader timing Pull Request resolved: #17105 This change improves the benchmark test harness in three ways: 1. **Reference computation caching**: Test cases are now grouped by a `ReferenceKey` that captures the inputs affecting reference output (sizes, dtype, data generation type). Reference computation runs once per group and results are reused, significantly speeding up test suites with many storage/layout variations of the same logical test case. 2. **Per-shader timing breakdown**: Benchmark output now shows individual shader execution times with global and local workgroup sizes, making it easier to identify performance bottlenecks when multiple shaders participate in an operator. 3. **Deferred data generation**: Tensor data is now generated lazily with explicit seeding, enabling deterministic data sharing across grouped test cases. This ensures identical inputs produce identical reference outputs for caching correctness. Also adds string input support (`ValueSpec::make_string()`) and helper functions for concise test case naming (`layout_abbrev`, `repr_str`, `shape_string`). ghstack-source-id: 338638546 @exported-using-ghexport Differential Revision: [D91945038](https://our.internmc.facebook.com/intern/diff/D91945038/)
…tion Pull Request resolved: #17170 This change introduces separate alignment fields to PackedDimInfo, decoupling the alignment used for padding tensor dimensions from the block size used for packing. Previously, `calculate_padded_sizes` used `packed_dim_block_size` and `outer_packed_dim_block_size` directly to determine how much to pad tensor dimensions. This works but limits flexibility - there are scenarios where we want to pad dimensions to a larger alignment than the block size for performance reasons, such as ensuring loads are aligned to cache lines or removing the need for bounds checking in shaders. The new fields `packed_dim_align` and `outer_packed_dim_align` allow specifying the alignment independently. For now, these are initialized to match the corresponding block sizes, preserving existing behavior. Future changes can set larger alignment values when beneficial for performance. Authored with Claude. ghstack-source-id: 338638551 @exported-using-ghexport Differential Revision: [D92196649](https://our.internmc.facebook.com/intern/diff/D92196649/)
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17260
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 119 PendingAs of commit 694f9b8 with merge base 1cffd23 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…perators (#17261) Implemented quantize_per_tensor and dequantize_per_tensor GLSL shaders and C++ dispatch logic to support the new single-dimension packed INT8 layouts (kPackedInt8_4W, kPackedInt8_4C, kPackedInt8_4H). These operators enable conversion between floating-point tensors and packed int8 representations with per-tensor scale and zero-point parameters. The implementation includes: - GLSL shaders: quantize_per_tensor and dequantize_per_tensor with support for both texture->buffer and buffer->buffer data flows, including GL_EXT_debug_printf statements for debugging - QuantizeDequantize.cpp: Added dispatch functions for the new layouts and registered etvk.q_dq_8bit_per_tensor.default operator - Test infrastructure: Created q_dq_8bit_per_tensor test binary with DEBUG_MODE support and reference CPU implementation for validation The shaders implement the quantization formula Q = clamp(round(x/scale) + zp, -128, 127) and dequantization formula x' = (Q - zp) * scale, with proper int8 packing/unpacking using little-endian byte ordering and sign extension. Differential Revision: [D92061370](https://our.internmc.facebook.com/intern/diff/D92061370/) [ghstack-poisoned]
This PR needs a
|
SS-JIA
approved these changes
Feb 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #17170 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/405/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/405/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/398/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/405/orig
Differential Revision: D92196649
@diff-train-skip-merge